Assignment VIII: Deep Learning¶
Question 1¶
Use the dataset, DEMO_DATA/chinese_name_gender.txt and create a Chinese name gender classifier using the deep learning method. You need to include a few important considerations in the creation of the deep learning classifer.
Please consult the lecture notes and experiment with different architectures of neural networks. In particular, please try combinations of the following types of network layers:
dense layer
embedding layer
RNN layer
bidirectional layer
Please include regularizations and dropbouts to avoid the issue of overfitting.
Please demonstrate how you find the optimal hyperparameters for the neural network using
keras-tuner.Please perform post-hoc analyses on a few cases using
LIMEfor more interpretive results.
Tokenizer¶
By default, the token index 0 is reserved for padding token.
If
oov_tokenis specified, it is default to index 1.Specify
num_wordsfor tokenizer to include only top N words in the modelTokenizer will automatically remove puntuations.
Tokenizer use whitespace as word delimiter.
If every character is treated as a token, specify
char_level=True.
Prepare Input and Output Tensors¶
Like in feature-based machine translation, a computational model only accepts numeric values. It is necessary to convert raw text to numeric tensor for neural network.
After we create the Tokenizer, we use the Tokenizer to perform text vectorization, i.e., converting texts into tensors.
In deep learning, words or characters are automatically converted into numeric representations.
In other words, the feature engineering step is fully automatic.
Two Ways of Text Vectorization¶
Texts to Sequences: Integer encoding of tokens in texts and learn token embeddings
Texts to Matrix: One-hot encoding of texts (similar to bag-of-words model)
Method 1: Text to Sequences¶
Vocabulary¶
Padding¶
When padding the all texts into uniform lengths, consider whether to Pre-padding or removing values from the beginning of the sequence (i.e.,
pre) or the other way (post).Check
paddingandtruncatingparameters inpad_sequences
Method 2: Text to Matrix¶
One-Hot Encoding¶
Text to Matrix (to create bag-of-word representation of each text)
Choose modes: binary, count, or tfidf
names_matrixin fact is a bag-of-characters representation of a name text.
Model Definition¶
After we have defined our input and output tensors (X and y), we can define the architecture of our neural network model.
For the two ways of name vectorized representations, we try two different network structures.
Text to Sequences: Embedding + RNN
Text to Matrix: Fully connected Dense Layers
Model 1: Fully Connected Dense Layers¶
Two fully-connected dense layers with the Text-to-Matrix inputs
plot_model(model1, show_shapes=True)
Model 2: Embedding + RNN¶
One Embedding Layer + One RNN Layer
With Text-to-Sequence inputs

plot_model(model2, show_shapes=True)
Model 3: Regularization and Dropout¶
Previous two examples clearly show overfitting of the models because the model performance on the validation set starts to stall after the first few epochs.
We can implement regularization and dropouts in our network definition to avoid overfitting.
plot_model(model3)
Model 4: Improve the Models¶
In addition to regularization and dropouts, we can further improve the model by increasing the model complexity.
In particular, we can increase the depths and widths of the network layers.
Let’s try stack two RNN layers.
plot_model(model4)
Model 5: Bidirectional¶
Now let’s try the more sophisticated RNN, LSTM, and with birectional computing.
And add more nodes to the LSTM layer.
plot_model(model5)
Check Embeddings¶
Compared to one-hot encodings of characters, embeddings may include more information relating to the characteristics of the characters.
We can extract the embedding layer and apply dimensional reduction techniques (i.e., TSNE) to see how embeddings capture the relationships in-between characters.
Hyperparameter Tuning¶
Note
Please install keras tuner module in your current conda:
pip install -U keras-tuner
Like feature-based ML methods, neural networks also come with many hyperparameters, which require default values.
Typical hyperparameters include:
Number of nodes for the layer
Learning Rates
We can utilize the module,
keras-tuner, to fine-tune the hyperparameters.
Steps for Keras Tuner
First, wrap the model definition in a function, which takes a single
hpargument.Inside this function, replace any value we want to tune with a call to hyperparameter sampling methods, e.g.
hp.Int()orhp.Choice(). The function should return a compiled model.Next, instantiate a tuner object specifying your optimization objective and other search parameters.
Finally, start the search with the
search()method, which takes the same arguments asModel.fit()in keras.When search is over, we can retrieve the best model and a summary of the results from the
tunner.
The
max_trialsvariable represents the number of hyperparameter combinations that will be tested by the tuner.The
execution_per_trialvariable is the number of models that should be built and fit for each trial for robustness purposes.
Explanation¶
Interpret the Model¶
from lime.lime_text import LimeTextExplainer
explainer = LimeTextExplainer(class_names=['Male'], char_level=True)
exp = explainer.explain_instance(
X_test_texts[text_id], model_predict_pipeline, num_features=100, top_labels=1)
exp.show_in_notebook(text=True)
exp = explainer.explain_instance(
'陳宥欣', model_predict_pipeline, num_features=100, top_labels=1)
exp.show_in_notebook(text=True)
exp = explainer.explain_instance(
'李安芬', model_predict_pipeline, num_features=2, top_labels=1)
exp.show_in_notebook(text=True)
exp = explainer.explain_instance(
'林月名', model_predict_pipeline, num_features=2, top_labels=1)
exp.show_in_notebook(text=True)
exp = explainer.explain_instance(
'蔡英文', model_predict_pipeline, num_features=2, top_labels=1)
exp.show_in_notebook(text=True)